Scale and Concurrency of Massive File System Directories

نویسندگان

  • Swapnil Patil
  • Christos Faloutsos
  • Gregory R. Ganger
  • Mahadev Satyanarayanan
چکیده

File systems store data in files and organize these files in directories. Over decades, file systems have evolved to handle increasingly large files: they distribute files across a cluster of machines, they parallelize access to these files, they decouple data access from metadata access, and hence they provide scalable file access for high-performance applications. Sadly, most cluster-wide file systems lack any sophisticated support for large directories. In fact, most cluster file systems continue to use directories that were designed for humans, not for large-scale applications. The former use-case typically involves hundreds of files and infrequent concurrent mutations in each directory, while the latter use-case consists of tens of thousands of concurrent threads that simultaneously create large numbers of small files in a single directory at very high speeds. As a result, most cluster file systems exhibit very poor file create rate in a directory either due to limited scalability from using a single centralized directory server or due to reduced concurrency from using a system-wide synchronization mechanism. This dissertation proposes a directory architecture called GIGA+ that enables a directory in a cluster file system to store millions of files and sustain hundreds of thousands of concurrent file creations every second. GIGA+ makes two contributions: a concurrent

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Swapnil Patil - Ph.D. Dissertation

File systems store data in files and organize these files in directories. Over decades, file systems have evolved to handle increasingly large files: they distribute files across a cluster of machines, they parallelize access to these files, they decouple data access from metadata access, and hence they provide scalable file access for high-performance applications. Sadly, most cluster-wide fil...

متن کامل

Towards a Grid File System Based on a Large-Scale BLOB Management Service

This paper addresses the problem of building a grid file system for applications that need to manipulate huge data, distributed and concurrently accessed at a very large scale. In this paper we explore how this goal could be reached through a cooperation between the Gfarm grid file system and BlobSeer, a distributed object management system specifically designed for huge data management under h...

متن کامل

Using File-Grain Connectivity to Implement a Peer-to-Peer File System

Recent work has demonstrated a peer-to-peer storage system that locates data objects using O logN messages by placing objects on nodes according to pseudo-randomly chosen IDs. While elegant, this approach constrains system functionality and flexibility: files are immutable, directories and symbolic names are not supported, data location is fixed, and access locality is not exploited. This paper...

متن کامل

Scale and Concurrency of GIGA+: File System Directories with Millions of Files

We examine the problem of scalable file system directories, motivated by data-intensive applications requiring millions to billions of small files to be ingested in a single directory at rates of hundreds of thousands of file creates every second. We introduce a POSIX-compliant scalable directory design, GIGA+, that distributes directory entries over a cluster of server nodes. For scalability, ...

متن کامل

Multi-Directory Hashing

We present a new dynamic hashing scheme for disk-based databases, called Multi-Directory Hashing (MDH). MDH uses multiple hash directories to access a file. The size of each hash directory grows dynamically with the file size. The advantages of MDH are enhanced concurrency, improved bucket utilization and smaller total directory size than single-directory hashing. The expected utilization of MD...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015